-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add CDC sync checkpointing based on time or records #21727
Conversation
…ing CDC synchronization. For that purpose we encapsulate an AirbyteMessage Iterator on a new iterator that handles the checkpoint messaging.
Affected Connector ReportNOTE
|
Connector | Version | Changelog | Publish |
---|---|---|---|
source-alloydb |
2.0.2 |
✅ | ✅ |
source-alloydb-strict-encrypt |
2.0.2 |
🔵 (ignored) |
🔵 (ignored) |
source-mssql |
1.0.0 |
✅ | ✅ |
source-mssql-strict-encrypt |
1.0.0 |
🔵 (ignored) |
🔵 (ignored) |
source-mysql |
2.0.0 |
✅ | ✅ |
source-mysql-strict-encrypt |
2.0.0 |
🔵 (ignored) |
🔵 (ignored) |
source-postgres |
2.0.2 |
✅ | ✅ |
source-postgres-strict-encrypt |
2.0.2 |
🔵 (ignored) |
🔵 (ignored) |
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Destinations (0)
Connector | Version | Changelog | Publish |
---|
- See "Actionable Items" below for how to resolve warnings and errors.
✅ Other Modules (0)
Actionable Items
(click to expand)
Category | Status | Actionable Item |
---|---|---|
Version | ❌ mismatch |
The version of the connector is different from its normal variant. Please bump the version of the connector. |
⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
|
Changelog | ⚠ doc not found |
The connector does not seem to have a documentation file. This can be normal (e.g. basic connector like source-jdbc is not published or documented). Please double-check to make sure that it is not a bug. |
❌ changelog missing |
There is no chnagelog for the current version of the connector. If you are the author of the current version, please add a changelog. | |
Publish | ⚠ not in seed |
The connector is not in the seed file (e.g. source_definitions.yaml ), so its publication status cannot be checked. This can be normal (e.g. some connectors are cloud-specific, and only listed in the cloud seed file). Please double-check to make sure that it is not a bug. |
❌ diff seed version |
The connector exists in the seed file, but the latest version is not listed there. This usually means that the latest version is not published. Please use the /publish command to publish the latest version. |
…state. Tests are failing as now it has more state messages: expected: <1> but was: <3>
Airbyte Code Coverage
|
…ip state message.
/test connector=connectors/source-postgres
Build PassedTest summary info:
|
/test connector=connectors/source-postgres-strict-encrypt
Build PassedTest summary info:
|
/test connector=connectors/source-alloydb
Build PassedTest summary info:
|
/test connector=connectors/source-alloydb-strict-encrypt
Build PassedTest summary info:
|
/test connector=connectors/source-mysql
Build PassedTest summary info:
|
/test connector=connectors/source-mysql-strict-encrypt
Build PassedTest summary info:
|
/test connector=connectors/source-mssql
Build PassedTest summary info:
|
/test connector=connectors/source-mssql-strict-encrypt
Build PassedTest summary info:
|
/publish connector=connectors/source-postgres
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-postgres-strict-encrypt auto-bump-version=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-alloydb
if you have connectors that successfully published but failed definition generation, follow step 4 here |
/publish connector=connectors/source-alloydb-strict-encrypt auto-bump-version=false
if you have connectors that successfully published but failed definition generation, follow step 4 here |
…nto sergio/feat/cdc-checkpointing
Manually bumped |
…21727) * This commit adds new functionality that generates checkpoints when doing CDC synchronization. For that purpose we encapsulate an AirbyteMessage Iterator on a new iterator that handles the checkpoint messaging. * Reformat code * Reformat code * Reformat code * Reformat code * Second attempt with ugly if statement * Add `isRecordBehindOffset` function to make sure is safe to send the state. Tests are failing as now it has more state messages: expected: <1> but was: <3> * Code formatting * Add additional check if the record is part of the snapshot load to skip state message. * Remove comments * Fix imports * Fix format * Add check if the iterator has extra elements so we don't send state message twice (edge case) * Add a new check to avoid sending multiple state messages with same offset. Fix PR comments. Not sending checkpoints... figuring out * Modify MSSQL and MySQL implementations * Adds better control on Maps and include a test for time checkpoint. Also adds extra assert to verify there are no duplicate states * Formatting * Improve code documentation and use default for CdcStateHandler new functions * Sort out missing `final` and types from comments * Minor improve in checkpoint validation * format files * It's 2023! * Import issues * Changes after merging master * Upgrade Debezium version in MySQL * Bump Postgres and Alloydb * auto-bump connector version * Manually bumping version --------- Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
…21727) * This commit adds new functionality that generates checkpoints when doing CDC synchronization. For that purpose we encapsulate an AirbyteMessage Iterator on a new iterator that handles the checkpoint messaging. * Reformat code * Reformat code * Reformat code * Reformat code * Second attempt with ugly if statement * Add `isRecordBehindOffset` function to make sure is safe to send the state. Tests are failing as now it has more state messages: expected: <1> but was: <3> * Code formatting * Add additional check if the record is part of the snapshot load to skip state message. * Remove comments * Fix imports * Fix format * Add check if the iterator has extra elements so we don't send state message twice (edge case) * Add a new check to avoid sending multiple state messages with same offset. Fix PR comments. Not sending checkpoints... figuring out * Modify MSSQL and MySQL implementations * Adds better control on Maps and include a test for time checkpoint. Also adds extra assert to verify there are no duplicate states * Formatting * Improve code documentation and use default for CdcStateHandler new functions * Sort out missing `final` and types from comments * Minor improve in checkpoint validation * format files * It's 2023! * Import issues * Changes after merging master * Upgrade Debezium version in MySQL * Bump Postgres and Alloydb * auto-bump connector version * Manually bumping version --------- Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* This commit adds new functionality that generates checkpoints when doing CDC synchronization. For that purpose we encapsulate an AirbyteMessage Iterator on a new iterator that handles the checkpoint messaging. * Reformat code * Reformat code * Reformat code * Reformat code * Second attempt with ugly if statement * Add `isRecordBehindOffset` function to make sure is safe to send the state. Tests are failing as now it has more state messages: expected: <1> but was: <3> * Code formatting * Add additional check if the record is part of the snapshot load to skip state message. * Remove comments * Fix imports * Fix format * Add check if the iterator has extra elements so we don't send state message twice (edge case) * Add a new check to avoid sending multiple state messages with same offset. Fix PR comments. Not sending checkpoints... figuring out * Modify MSSQL and MySQL implementations * Adds better control on Maps and include a test for time checkpoint. Also adds extra assert to verify there are no duplicate states * Formatting * Improve code documentation and use default for CdcStateHandler new functions * Sort out missing `final` and types from comments * Minor improve in checkpoint validation * format files * It's 2023! * Import issues * Changes after merging master * Upgrade Debezium version in MySQL * Bump Postgres and Alloydb * auto-bump connector version * Manually bumping version --------- Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
* This commit adds new functionality that generates checkpoints when doing CDC synchronization. For that purpose we encapsulate an AirbyteMessage Iterator on a new iterator that handles the checkpoint messaging. * Reformat code * Reformat code * Reformat code * Reformat code * Second attempt with ugly if statement * Add `isRecordBehindOffset` function to make sure is safe to send the state. Tests are failing as now it has more state messages: expected: <1> but was: <3> * Code formatting * Add additional check if the record is part of the snapshot load to skip state message. * Remove comments * Fix imports * Fix format * Add check if the iterator has extra elements so we don't send state message twice (edge case) * Add a new check to avoid sending multiple state messages with same offset. Fix PR comments. Not sending checkpoints... figuring out * Modify MSSQL and MySQL implementations * Adds better control on Maps and include a test for time checkpoint. Also adds extra assert to verify there are no duplicate states * Formatting * Improve code documentation and use default for CdcStateHandler new functions * Sort out missing `final` and types from comments * Minor improve in checkpoint validation * format files * It's 2023! * Import issues * Changes after merging master * Upgrade Debezium version in MySQL * Bump Postgres and Alloydb * auto-bump connector version * Manually bumping version --------- Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
This PR adds new functionality that generates checkpoints when doing CDC synchronisation: #21009
To doing so, instead of creating a new iterator on top of the current one, we are just changing the way we process the events.
We iterate over
ChangeEvent
s and we figure out:lsn
of the record is higher than the one read by DebeziumIf all the previous conditions are met, a new state message is going to be sent in the next iteration. If it's also the end of the iterator, the state message is sent only once.
It's possible that there will NOT be a checkpoint save every X records/seconds as expected.
The process needs to validate another conditions before setting sending the
STATE
message (like is not snapshot load, or the record we are publishing is actually after the offset we are going to send).